Modeling protein families using probabilistic su x trees

نویسنده

  • Gill Bejerano
چکیده

We present a method for modeling protein families by means of probabilistic suux trees (PSTs). The method is based on identifying signiicant patterns in a set of related protein sequences. The input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surprising success. Incorporating basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can improve the performance in some cases. The PST can serve as a predictive tool for protein sequence classiication, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on one of the state of the art databases of protein families, namely, the Pfam database of HMMs, with satisfactory performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families

MOTIVATION We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without...

متن کامل

Protein Family Classification Using Sparse Markov Transducers

We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditio...

متن کامل

Protein Family Classi cation using Sparse Markov Transducers

In this paper we present a method for classifying proteins into families using sparse Markov transducers (SMTs). Sparse Markov transducers, similar to probabilistic suÆx trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suÆx trees by allowing for wild-cards in the conditioning sequences. Because substitutions of amino acids are common in ...

متن کامل

Shared Context Probabilistic Transducers

Recently a model for supervised learning of probabilistic transduc ers represented by su x trees was introduced However this algo rithm tends to build very large trees requiring very large amounts of computer memory In this paper we propose a new more com pact transducer model in which one shares the parameters of distri butions associated to contexts yielding similar conditional output distrib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006